DiscoverHuggingFace 每日AI论文速递2025.11.11 | 小窗口勤总结刷新深度研究;先广撒网再啃难题激活代码竞赛
2025.11.11 | 小窗口勤总结刷新深度研究;先广撒网再啃难题激活代码竞赛

2025.11.11 | 小窗口勤总结刷新深度研究;先广撒网再啃难题激活代码竞赛

Update: 2025-11-11
Share

Description

本期的 13 篇论文如下:

[00:25 ] 🧩 IterResearch: Rethinking Long-Horizon Agents via Markovian State Reconstruction(IterResearch:基于马尔可夫状态重构的长程智能体再思考)

[01:16 ] 🏆 DRIVE: Data Curation Best Practices for Reinforcement Learning with Verifiable Reward in Competitive Code Generation(DRIVE:面向可验证奖励强化学习的竞赛级代码生成数据精选最佳实践)

[02:03 ] 🔬 The Station: An Open-World Environment for AI-Driven Discovery(“站”:面向AI驱动科学发现的开放世界环境)

[02:43 ] 🚀 RedOne 2.0: Rethinking Domain-specific LLM Post-Training in Social Networking Services(RedOne 2.0:社交网络场景下领域大模型后训练新范式)

[03:15 ] 🧠 SofT-GRPO: Surpassing Discrete-Token LLM Reinforcement Learning via Gumbel-Reparameterized Soft-Thinking Policy Optimization(SofT-GRPO:用Gumbel重参数化软思考策略优化让离散Token强化学习望尘莫及)

[03:53 ] 🧭 Routing Manifold Alignment Improves Generalization of Mixture-of-Experts LLMs(路由流形对齐提升混合专家大语言模型的泛化能力)

[04:30 ] 🔍 Reasoning with Confidence: Efficient Verification of LLM Reasoning Steps via Uncertainty Heads(以置信度推理:通过不确定性头高效验证大模型推理步骤)

[05:10 ] 🎬 MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs(MVU-Eval:面向多模态大模型的多视频理解评测基准)

[05:50 ] 🎨 MPJudge: Towards Perceptual Assessment of Music-Induced Paintings(MPJudge:面向音乐诱发绘画的感知评估)

[06:57 ] 🔄 RLoop: An Self-Improving Framework for Reinforcement Learning with Iterative Policy Initialization(RLoop:一种通过迭代策略初始化自我提升的强化学习框架)

[07:36 ] 🤖 Robot Learning from a Physical World Model(基于物理世界模型的机器人学习)

[08:21 ] 🛠 NURBGen: High-Fidelity Text-to-CAD Generation through LLM-Driven NURBS Modeling(NURBGen:基于大模型驱动NURBS建模的高保真文本转CAD生成)

[08:52 ] 🚀 SWE-fficiency: Can Language Models Optimize Real-World Repositories on Real Workloads?(SWE-fficiency:语言模型能否在真实工作负载下优化真实仓库性能?)

<figure></figure>

【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递

Comments 
loading
In Channel
loading
00:00
00:00
1.0x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

2025.11.11 | 小窗口勤总结刷新深度研究;先广撒网再啃难题激活代码竞赛

2025.11.11 | 小窗口勤总结刷新深度研究;先广撒网再啃难题激活代码竞赛